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| The classical result of Erdos and Renyi asserts that the random graph G(n,p) experiences 

sharp phase transition around p = — - for any e > and p = all connected components 
of G(n,p) arc typically of size O e (logn), while for p — with high probability there exists 
a connected component of size linear in n. We provide a very simple proof of this fundamental 
result; in fact, we prove that in the supercritical regime p — the random graph G(n 7 p) 
contains typically a path of linear length. We also discuss applications of our technique to other 
random graph models and to positional games. 



^ '. 1 Introduction 

ov 

In their groundbreaking paper [8] from 1960, Paul Erdos and Alfred Renyi made the following 
VjD 1 fundamental discovery: the random graph G(n,p) undergoes a remarkable phase transition around 

the edge probability p(n) = ^. For any constant e > 0, if p = then G(n,p) has whp0 
| all connected components of size at most logarithmic in n, while for p = whp a connected 

component of linear size, usually called the giant component, emerges in G(n,p) (they also showed 
that whp there is a unique linear sized component). The Erdos-Renyi paper, which launched the 
modern theory of random graphs, has had enormous influence on the development of the field and 
is generally considered to be a single most important paper in Probabilistic Combinatorics, if not 
in all of Combinatorics. 
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There are now several proofs available for this result. Erdos and Renyi (who actually worked 
in the model G(n, m) of random graphs) used counting arguments. Some of later proofs relied 
on the machinery of branching processes. As one can expect for a result of this magnitude of 
importance, there have been countless ramifications and extensions proven over the years, and by 
now the evolution of random graphs is very well understood. We refer the reader to the standard 
sources in the theory of random graphs [ID] . [7] for a detailed account. 

In 1981, Ajtai, Komlos and Szemeredi proved [T] that in the supercritical regime p = not 
only the random graph G(n,p) contains whp a linear sized connected component, but it typically 
has a path of length linear in n. 

The purpose of this note is to present a very simple and self-contained proof of the Erdos- Renyi 
result, as well of the result of Ajtai, Komlos and Szemeredi. We do not strive to derive the best 
possible absolute constants, aiming rather for simplicity. 

Our notation is fairly standard. We set N = (2) . Floor and ceiling signs will be systematically 
omitted for the sake of clarity of presentation. 

2 Main result 

Our argument will utilize the notion of the Depth First Search (DFS). This is a well known graph 
exploration algorithm, and we thus will describe it rather briefly. 

Recall that the DFS (Depth First Search) is a graph search algorithm that visits all vertices of 
a (directed or undirected) graph G = (V, E) as follows. It maintains three sets of vertices, letting 
S be the set of vertices whose exploration is complete, T be the set of unvisited vertices, and 
U = V \ (S L)T), where the vertices of U are kept in a stack (the last in, first out data structure). 
It is also assumed that some order a on the vertices of G is fixed, and the algorithm prioritizes 
vertices according to a. The algorithm starts with S = U = and T = V, and runs till U U T = 0. 
At each round of the algorithm, if the set U is non-empty, the algorithm queries T for neighbors 
of the last vertex v that has been added to U, scanning T according to a. If v has a neighbor u 
in T, the algorithm deletes u from T and inserts it into U. If v does not have a neighbor in T, 
then v is popped out of U and is moved to S. If U is empty, the algorithm chooses the first vertex 
of T according to a, deletes it from T and pushes it into U. In order to complete the exploration 
of the graph, whenever the sets U and T have both become empty (at this stage the connected 
component structure of G has already been revealed), we make the algorithm query all remaining 
pairs of vertices in S = V, not queried before. 

Observe that the DFS algorithm starts revealing a connected component C of G at the moment 
the first vertex of C gets into (empty beforehand) U and completes discovering all of C when U 
becomes empty again. We call a period of time between two consecutive emptyings of U an epoch, 
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each epoch corresponds to one connected component of G. 

The following properties of the DFS algorithm will be relevant to us: 

• at each round of the algorithm one vertex moves, either from T to U, or from U to S; 

• at any stage of the algorithm, it has been revealed already that the graph G has no edges 
between the current set S and the current set T; 

• the set U always spans a path (indeed, when a vertex u is added to U, it happens because u 
is a neighbor of the last vertex v in U; thus, u augments the path spanned by U, of which v 
is the last vertex). 

We will run the DFS on a random input G ~ G(n,p), fixing the order a on V(G) = [n] to be 
the identity permutation. When the DFS algorithm is fed with a sequence of i.i.d. Bernoulli(p) 
random variables X = {Xi)^L l} so that is gets its i-th query answered positively if Xi = 1 and 
answered negatively otherwise, the so obtained graph is clearly distributed according to G(n,p). 
Thus, studying the component structure of G can be reduced to studying the properties of the 
random sequence X. In particular, observe crucially that as long as T ^ 0, every positive answer 
to a query results in a vertex being moved from T to U, and thus after t queries and assuming T ^ 
still, we have \S U U\ > Y^i=i Xi- (The last inequality is strict in fact as the first vertex of each 
connected component is moved from T to U "for free", i.e., without need to get a positive answer 
to a query.) On the other hand, since the addition of every vertex, but the first one in a connected 
component, to U is caused by a positive answer to a query, we have at time t: \U\ < 1 + Yl\=i X%- 

The probabilistic part of our argument is provided by the following quite simple lemma. 

Lemma 1 Let e > be a small enough constant. Consider the sequence X = {Xi)f =l of i.i.d. 
Bernoulli random variables with parameter p. 

1. Let p = Let k = Inn. Then whp there is no interval of length kn in [N], in which at 
least k of the random variables X{ take value 1 . 

2. Let p= k £ L . Let N = ^f. Then whp 

Proof. 1) For a given interval / of length kn in [N], the sum X^g/^« ^ s distributed binomially 
with parameters kn and p. Applying the standard Chernoff-type bound (see, e.g., Theorem A. 1.11 
of [2]) to the upper tail of B(kn,p), and then the union bound, we see that the probability of the 
existence of an interval violating the assertion of the lemma is at most 

(N -k + l)Pr[B(kn,p) > k] < n 2 ■ e~^ < n 2 ■ e"^^ 7* lnn = o(l) , 
for small enough e > 0. 



e{l+e)n 
2 



< n 2/3 . 
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2) The sum Xi is distributed binomially with parameters Ao and p. Hence, its expectation is 
Nop = = 6 ^^ n , and its standard deviation is of order n. Applying the Chebyshev inequality, 
we get the required estimate. 

Now we are ready to formulate and to prove our main result. 
Theorem 1 Let e > be a small enough constant. Let G ~ G(n,p). 

1. Let p = Then whp all connected components of G are of size at most -^Tnn. 

2. Let p = ii^. Then whp G contains a path of length at least ^P-. 



In both cases, we run the DFS algorithm on G ~ G(n,p), and assume that the sequence X = {Xi)f =l 
of random variables, defining the random graph G ~ G(n,p) and guiding the DFS algorithm, 
satisfies the corresponding part of Lemma [TJ 

Proof. 1) Assume to the contrary that G contains a connected component C with more than 
k = -j In n vertices. Let us look at the epoch of the DFS when C was created. Consider the 
moment inside this epoch when the algorithm has found the (k + l)-st vertex of C and is about to 
move it to U. Denote AS* = S DC at that moment. Then \AS U U\ = k, and thus the algorithm 
got exactly k positive answers to its queries to random variables Xi during the epoch, with each 
positive answer being responsible for revealing a new vertex of C, after the first vertex of C was put 
into U in the beginning of the epoch. At that moment during the epoch only pairs of edges touching 
AS* U U have been queried, and the number of such pairs is therefore at most ( 2 ) + k(n — k) < kn. 
It thus follows that the sequence X contains an interval of length at most kn with at least k l's 
inside - a contradiction to Property 1 of Lemma [H 

2) Assume that the sequence X satisfies Property 2 of Lemma [TJ We claim that after the first 

2 2 

Ao = queries of the DFS algorithm, the set U contains at least vertices (with the contents 
of U forming a path of desired length at that moment). Observe first that IS"! < ^ at time Nq. 
Indeed, if |5| > ^, then let us look at a moment t where \S\ = § (such a moment surely exists as 
vertices flow to S one by one). At that moment \U\ < 1 + Yll=i X% < § by Property 2 of Lemma [TJ 

2 

Then \T\ = n — \S\ — \U\ > ^, and the algorithm has examined all \S\ ■ \T\ > ^- > Nq pairs between 
S and T (and found them to be non-edges) - a contradiction. Let us return to time Ao- If \S\ < § 
and \U\ < then, we have T / (5. This means in particular that the algorithm is still revealing 
the connected components of G, and each positive answer it got resulted in moving a vertex from 
T to U (some of these vertices may have already moved further from U to S). By Property 2 of 
Lemma [TJ the number of positive answers at that point is at least e ( 1 + £ ) n _ n 2//3 . Hence we have 
\SUU\ > ^±±^-n 2 / 3 . If \U\ < ^, then \S\ > f + ^-n 2 / 3 . All |5||T| > \S\ (n - \S\ - 
pairs between S and T have been probed by the algorithm (and answered in the negative). We 
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thus get: 
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(we used the assumption |5| < §), and this is obviously a contradiction, completing the proof. 



3 Discussion 



1. Observe that using a Chernoff-type bound for the tales of the binomial random variable instead 
of the Chebyshev inequality would allow to claim in the second part of Lemma [T] that the sum 
Tl!i=iXi is close to t ( 1 + t ^ n with probability exponentially close to 1. This would show in turn, 
employing the argument of Theorem [TJ that G(n,p) with p = contains a path of length linear 
in n with exponentially high probability, namely, with probability 1 — exp{— c(e)n}. 

2. The dependencies on e in both parts of Theorem [1] are of the correct order of magnitude - for 
p = a largest connected component of G(n,p) is known to be whp of size 0(e~ 2 ) log n (see, e.g., 
Cors. 5.8 and 5.11 of [7]), while for p = a longest cycle of G(n,p) is whp of length 0(e 2 )n (see, 
e.g., Th. 5.17 of [10J); the standard trick of sprinkling further random edges with edge probability 
p' = o{n~ l ) shows that if G(n,p) contains whp a path of length an for some constant a > 0, then 
G(n,p + p') contains whp a cycle of length at least (a — o(l))ra. Note also that although we stated 
our result in Theorem [1] for a constant e > 0, our argument is in fact valid for e = e(n) — > as 
well, with a bit more careful treatment of the error terms in our proofs. Actually we can take e(n) 
to be as low as e > n -1 / 3 log 1//3 n in our arguments (including the theorem in the next remark) - 
which nearly borders the critical window e = ©(n -1 / 3 ). 

3. The giant component itself in the regime p = e > a constant, is known to be substantially 
larger typically than a longest path - it has whp 0(e)n vertices (see, e.g., Th. 5.4 of [10]). Using 
very similar techniques, we can show the probable existence of a connected component of size Cl(e)n 
in this range, as given by the following theorem. 

Theorem 2 Let p = =-^-, for e > a small enough constant. Let G ~ G(n,p). Then whp G has 
a connected component with at least ^ vertices. 

Proof. The proof is quite similar to that of Theorem [H and therefore we will allow ourselves to 
be rather concise. Here too we run the DFS algorithm on G~ G(n,p) and feed it with a sequence 
X of i.i.d. Bernoulli^) random variables X = {Xi)f =l . Denote as before iVo = We will need 
the following typical properties of the sequence X, slightly generalizing those stated in Part 2. of 
Lemma [1] and provable using the same Chernoff-type estimates: 
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1. e£i 4 ^<^ 5/6 ; 

2. For every n 7 / 4 < t < N , \Ya=i X i ~ i 1 + 6 )tH ^ n2/S - 

Let us assume now that the sequence X satisfies the above stated properties. We claim that after 
the first Nq queries of the DFS algorithm, we are in the midst of revealing a connected component 
whose size is at least Just as in the proof of Theorem Q] we have that \S\ < ^ at time No, and T 
is still non-empty. It follows that at any moment re 7 / 4 < t < No we have: \SU U\ > (1 + e) - — n 2 / 3 . 
If at some moment t in this interval the set U becomes empty, the algorithm has asked all queries 
between the set S and its complement T = [n] — S, implying: 

t > \S\(n - \S\) > ( (1 + e)- - n 2 /' 3 ) ( n - (1 + e)- + n 2 / 3 ) > (1 + e)t - (1 + e) 2 ^ " 2n 5 / 3 
\ n J \ n J n z 

> (l + 6)(l-(l + 6)|)t-2n 5 / 3 = (l + e )(l-|-0t-2n 5 / 3 >t 

- a contradiction, for small enough e > 0. (We used \S\ < ^ in the above estimate.) Hence U is 
never empty in the interval [n 7//4 , iVo]. It follows that all vertices added to U during this interval 
(of which some may have migrated further to S) are in the same connected component, and their 
number is, by the properties of X stated above, 

£ Xl > (1 + e ) ^ _ n 2/3 _ n 5/6 > (1 + e) ™ _ 2n 5/6 > ^ 

All these vertices belong to the same connected component - whose size is then at least com- 
pleting the proof. 

4. As we have already mentioned, the DFS algorithm is applicable equally well to directed graphs. 
Hence essentially the same argument as above, with obvious minor changes, can be applied to the 
model D(n,p) of random digraphs. In this model, the vertex set is [re], and each of the n(n — 1) 
ordered pairs 1 < i ^ j < re, is a directed edge of D ~ D(n,p) with probability p = p{n) and 
independently from other pairs. In particular we can obtain the following theorem: 

Theorem 3 Let p = -i^, for e > constant. Then the random digraph D(n,p) has whp a directed 
path and a directed cycle of length 0(e 2 )n. 

This recovers the classical result of Karp [11] for the model D(n,p). 

5. The technique of Theorem Q] can be applied to further models of random graphs and digraphs. 
One immediate application is to random subgraphs of graphs of large minimum degree. We have 
the following theorem. 
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Theorem 4 Let G be a finite graph with minimum degree at least n. Let p = for e > 
constant. Form a random subgraph G p of G by including every edge of G into G p independently 

2 

and with probability p. Then whp G p has a path of length at least ^p. 

The proof is essentially identical to that of Theorem [TJ We run the DFS process on G p and 
feed it with a sequence X of i.i.d. Bernoulli(p) random variables X = {Xi)f =1 , where N = \E(G)\. 
For the proof, we need only to notice that at any time the number of edges of G between S and 
T can be estimated from below by \S\{5(G) — \S\ — \U\) > \S\(n — \S\ — \U\), the rest of the proof 
is the same. Notice that getting a long cycle appears to be a much more challenging task in this 
setting - the base graph G can be of girth (much) larger than n, and therefore sprinkling does not 
necessarily help (immediately) to turn a long path into a long cycle whp . 

6. Another example of applying our technique is random subgraphs of pseudo-random graphs. Let 
G be an (n, d, A)-graph (a d-regular graph on n vertices, in which all eigenvalues of the adjacency 
matrix, but the first one, are at most A in their absolute values - see, e.g. [12] for a thorough 
discussion of this notion). It is well known that requiring A <C d is enough to guarantee many 
pseudo-random properties of such a graph. The model of taking a random subgraph G p of an 
(n, d, A)-graph G has been considered by Frieze, Krivelevich and Martin in [9]. It is proven in [9] 
that, assuming A <C d, for p = the random subgraph G p of an an (n, d, A)-graph G has whp 
the unique connected component of size linear in n. We can apply the technique of Theorem Q] to 
prove the following: 

Theorem 5 Let G be an (n, d, X) -graph with A = o(d). Let p = -4^, for e > constant. Then the 
random subgraph G p contains whp a path of length @(e 2 )n. 

Here is a very brief sketch of the proof. We run the DFS algorithm on G p till it queries edges 
of G. 

Similarly to Lemma [TJ it gets whp about £ ( 1 + £ ) n positive answers during this period, when fed 
with a string of i.i.d. Bernoulli (^^p) random variables. In order for the proof analogous to that of 
Theorem [JJ to go through, one only needs to be able to control the number of edges between any 
two linear sized vertex subsets S, T in G. Such a control is indeed available for (n, d, A)-graphs - it 
is known that if G is an (n, d, A)-graph, then for any two vertex subsets S,TC V(G) the number 
ec(S,T) of edges of G with one endpoint in S and another in T satisfies: 

< Av^Spl 

(see, e.g. Corollary 9.2.5 of [2] or Theorem 2.11 of [12J). Assuming A d is enough therefore 
to guarantee that ec(S,T) = (1 + o(l))^|5|||T| in such a graph, and the proof for the random 
subgraph proceeds as in Theorem [TJ Here too sprinkling helps to turn a long path into a long cycle 



e G (S,T)--\S\ \T 
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whp - we first get whp a linearly long path and then argue that due to the above estimate on the 
edge distribution of G there are Q(dn) edges between the prefix and the suffix of the path, and one 
of them will whp fall into a sprinkled graph, thus closing a long cycle. 

7. Yet another application of our proof strategy is to positional games. The following game C(n, b) 
was considered by Bednarska and Luczak in [3j. The game is played between two players, Maker and 
Breaker, alternately claiming 1 and b edges, respectively, of the complete graph K n on n vertices, till 
all edges of K n have been claimed by either of the players. Maker's goal is to maximize the number 
of vertices in a largest connected component in her graph by the end of the game, Breakers aims 
to make it as small as possible. Bednarska and Luczak discovered the following phase transition 
phenomenon, obviously reminiscent of the Erdos-Renyi phase transition in random graphs. Let 
e > be a constant. lib = (1 + e)n then Breaker has a strategy to keep all of Maker's connected 
components of size 0(l/e). On the other hand, if b = (1 — e)n, then Maker has a strategy to create 
a connected component of size 0(e)n. We can prove the following result. 

Theorem 6 Let e > 0. Then in the game C(n, b) with b = (1 — e)n, Maker has a strategy to create 
a path of length Q(e 2 )n. 

The winning strategy of Maker and the proof of its validity are fairly similar to the proof of Theorem 
[TJ Maker maintains three sets S, U, T partitioning [n], starting with S = $, and U being an arbitrary 
vertex from [n] . She makes sure that the set U always spans a path of her edges at any stage of the 
game. At each Maker's turn, she finds the last vertex v along the path in U for which there exists 
an unclaimed edge (v, u) with ufT, shifts all further vertices after v along U into S and claims 
the edge (v,u), moving u from T to U. If no such vertex is available along the current path in U, 
Maker moves all of its vertices into S, loads U with an arbitrary vertex u from T and then proceeds 
as described before. One can observe that, similarly to the analysis of the DFS algorithm, at any 
stage of the game all edges between the current set S and the current set T have been claimed by 
Breaker. Now, look at the situation in the game after ^ rounds. At that point \S U U\ > ^r. If 

2 

one has \U\ < ^p, then all 



edges between S and T have been claimed by Breaker - a contradiction, for small enough e > 0. 
The situation with making a cycle is quite different here - it has been shown by Bednarska and 
Pikhurko [4] that if b = b(n) is such that Maker completes the game with at most n — 1 edges, then 
Breaker has a strategy to force Maker to end up with a tree; thus b > (1 + o(l))n/2 is required for 
Maker to create a cycle of any length. 

8. Some of the idea utilized in this paper have already been applied before. In particular, the DFS 
algorithm has been used by Ben-Eliezer and the authors in [6] to prove the following statement: 
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if in a graph Goer vertices there is an edge between every pair of disjoint vertex subsets of size 
A;, then G contains a path of length n — 2k + 1. This deterministic statement implies readily that 
G(n,p) with p = c/n contains whp a path of length (1 — a(c))n, where a(c) — > as c — > oo. Also, 
Benjamini and Schramm [5] used the idea of coupling a graph search algorithm with a sequence X of 
random bits, serving as answers to the algorithm's queries, to derive some results about percolation 
in expanding graphs. 
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